Fluctuations of the longest common subsequence for sequences of independent blocks
نویسندگان
چکیده
The problem of the order of the fluctuation of the Longest Common Subsequence (LCS) of two independent sequences has been open for decades. There exist contradicting conjectures on the topic, [1] and [2]. Lember and Matzinger [3] showed that with i.i.d. binary strings, the standard deviation of the length of the LCS is asymptotically linear in the length of the strings, provided that 0 and 1 have very different probabilities. Nonetheless, with two i.i.d. sequences and a finite number of equiprobable symbols, the typical size of the fluctuation of the LCS remains unknown. In the present article, we determine the order of the fluctuation of the LCS for a special model of i.i.d. sequences made out of blocks. A block is a contiguous substring consisting only of one type of symbol. Our model allows only three possible block lengths, each been equiprobable picked up. For i.i.d. sequences with equiprobable symbols, the blocks are independent of each other. In order to study the fluctuation of the LCS in this model, we developed a method which reformulates the fluctuation problem as a (relatively) low dimensional optimization problem. We finally proved that for our model, the fluctuation of the ength of the LCS coincides with the Waterman’s conjecture [2]. We belive that our method can be applied to any other case dealing with i.i.d. sequences, only that the optimization problem might be more complicated to formulate and to solve.
منابع مشابه
Random modification effect in the size of the fluctuation of the LCS of two sequences of i.i.d. blocks
The problem of the order of the fluctuation of the Longest Common Subsequence (LCS) of two independent sequences has been open for decades. There exist contradicting conjectures on the topic, [1] and [2]. In the present article, we consider a special model of i.i.d. sequences made out of blocks. A block is a contiguous substring consisting only of one type of symbol. Our model allows only three...
متن کاملFinding Longest Common Increasing Subsequence for Two Different Scenarios of Non-random Input Sequences
By reviewing Longest Increasing Subsequence (LIS) and Longest Common Subsequence (LCS), the Longest Common Increasing Subsequence (LCIS) problem is explored for two non-random input cases in details. Specifically, we designed two algorithms, one solving the input sequence scenario with the case that one sequence is ordered and duplicate elements are allowed in each of sequences, and the second ...
متن کاملThe Longest Common Subsequence Problem forArc -
Arc-annotated sequences are useful in representing the structural information of RNA and protein sequences. Recently, the longest arc-preserving common subsequence problem has been introduced in 6, 7] as a framework for studying the similarity of arc-annotated sequences. In this paper, we consider arc-annotated sequences with various arc structures and present some new algorithmic and complexit...
متن کاملThe Longest Common Subsequence Problem
Algorithms on sequences of symbols have been studied for a long time and now form a fundamental part of computer science. One of the very important problems in analysis of sequences is the longest common subsequence problem. For the general case of an arbitrary number of input sequences, the problem is NP-hard. We describe an approach to solve this problem. This approach is based on constructin...
متن کاملThe Longest Common Subsequence Problem for Arc-Annotated Sequences
Arc-annotated sequences are useful in representing the structural information of RNA and protein sequences. The longest arc-preserving common subsequence problem has been introduced in [1], [2] as a framework for studying the similarity of arc-annotated sequences. In this paper, we consider arc-annotated sequences with various arc structures. Mathematics Subject Classification: 68Q15
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010